The volcano3D package enables exploration of probes differentially expressed between three groups. Its main purpose is for the visualisation of differentially expressed genes in a three-dimensional volcano plot. These plots can be converted to interactive visualisations using plotly.
This vignette consists of a case study from the PEAC rheumatoid arthritis project (Pathobiology of Early Arthritis Cohort). The methodology has been published in ‘Lewis, Myles J., et al. “Molecular portraits of early rheumatoid arthritis identify clinical and treatment response phenotypes.” Cell reports 28.9 (2019): 2455-2470.’ (DOI: 10.1016/j.celrep.2019.07.091) with an interactive web tool available at https://peac.hpc.qmul.ac.uk.
Not yet publicly available:
install.packages("volcano3D")
library(devtools)
install_github("KatrionaGoldmann/volcano3D")
library(volcano3D)
The sample data used in this vignette can be loaded from the volcano3Ddata package. This is only possible after volcano3D is imported.
install.packages("volcano3Ddata")
Variables used in this vignette:
| Variable | Definition |
|---|---|
| contrast | the variable by which samples can be split into three groups. |
| groups | the three levels/categories of the contrast variable. |
| comparison | two groups between which a statistical test can be performed. There are three comparisons total. For the examples outlined in this vignette we look at comparisons: ‘lymphoid-myeloid’, ‘lymphoid-fibroid’ and ‘myeloid-fibroid’. |
| p | p value |
| FC | fold change |
| padj | adjusted p value |
| suffix | the tail word in a column name. In this package it states the statistical parameter (e.g. _logFC is the log FC variable) |
| prefix | the leading word in a column name. In this package it states the statistical test (e.g. LRT is the likelihood ratio test). |
| polar | A polar coordinates object, of S4 class, containing the expression data, sample data, pvalues and polar coordinates. |
This vignette will demonstrate the power of this package using a basic example from the PEAC data set. Here we will focus on the synovial data from this cohort.
First we will set up a polar object, using the polar_coords function, which contains:
To create the polar coordinates you must provide:
| Dataframe | Column(s) | Purpose | |
|---|---|---|---|
| sampledata | ID | Contains the sample IDs. This must be titled ‘ID’. | |
| contrast | A column containing the three-level factor used for contrasts. This column name must match the contrast variable | ||
| pvalues | pvalue | Three pvalues columns with the differential expression between groups. Column names are of format paste0(groups[i], "-", groups[j], " ", p_col_suffix) |
|
| fold change (optional, x3) | Three optional fold change columns. One for each comparison between groups. Column names are of format paste0(groups[i], "-", groups[j], " ", fc_col_suffix). |
||
| adjusted pvalue (optional, x3) | Three optional adjusted pvalue columns. One for each comparison between groups. Column names are of format paste0(groups[i], "-", groups[j], " ", padj_col_suffix). |
||
| multi-group pvalue (optional) | An optional multi-group pvalue column. This is typically generated using ANOVA or likelihood ratio tests between all three groups. Column names of form paste(multi_group_prefix, p_col_suffix) |
||
| multi-group fold change (optional) | An optional multi-group fold change column. This is typically generated using ANOVA or likelihood ratio tests between all three groups. Column names of form paste(multi_group_prefix, fc_col_suffix) |
||
| multi-group adjusted pvalue (optional) | An optional multi-group adjusted pvalue column. This is typically generated using ANOVA or likelihood ratio tests between all three groups. Column names of form paste(multi_group_prefix, padj_col_suffix) |
||
| label (optional) | An optional character column with probe labels for downstream plotting. |
with additional arguments:
| Variable | Purpose |
|---|---|
| contrast | The column name in sampledata which contains the variable by which samples can be split into three groups. |
| groups (optional) | The groups to be compared (in order). If NULL this defaults to levels(sampledata[, ‘contrasts’])`. |
| p_col_suffix (optional) | The suffix of column names with pvalues (default is ‘pvalue’). |
| padj_col_suffix (optional) | The suffix of column names with adjusted pvalues (defaults ‘padj’). If NULL the adjusted pvalue is calculated using p_col_suffix and padjust_method. |
| padjust_method (optional) | The method to calculate adjusted pvalues if not already provided. Must be one of c(‘holm’, ‘hochberg’, ‘hommel’, ‘bonferroni’, ‘BH’, BY’, ‘fdr’, ‘none’). Default is ‘BH’. |
| fc_col_suffix (optional) | The suffix of column names with log(fold change) values default is ‘logFC’). |
| multi_group_prefix (optional) | The prefix for columns containing statistics for a multi-group test (this is typically a likelihood ratio test or ANOVA). Default is NULL. |
| label_column (optional) | A column name in pvalues which is to be used to label markers of interest at plotting stage. If NULL the rownames will be used. |
A vignette with examples to calculate the pvalues dataframes can be found in the pvalues vignette.
Using the synovial biopsies from PEAC we can create a polar object for differentially expressed genes. First we will install the volcano3Ddata package if necessary:
install.packages("volcano3Ddata")
Then the data can be loaded and polar coordinates calculated:
library(volcano3Ddata)
data("syn_data")
syn_polar <- polar_coords(sampledata = syn_metadata,
contrast = "Pathotype",
pvalues = syn_pvalues,
expression = syn_rld,
p_col_suffix = "pvalue",
padj_col_suffix = "padj",
fc_col_suffix = "log2FoldChange",
multi_group_prefix = "LRT",
non_sig_name = "Not Significant",
significance_cutoff = 0.01,
label_column = NULL,
fc_cutoff = 0.1)
The pvalues slot should have at least two statistics for each comparison - pvalue and adjusted pvalue with an optional logarithmic fold change statistic also:
head(syn_polar@pvalues)
| Fibroid-Lymphoid logFC | Fibroid-Lymphoid padj | Fibroid-Lymphoid pvalue | LRT logFC | LRT padj | LRT pvalue | Lymphoid-Myeloid logFC | Lymphoid-Myeloid padj | Lymphoid-Myeloid pvalue | Myeloid-Fibroid logFC | Myeloid-Fibroid padj | Myeloid-Fibroid pvalue | label | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| A2M | -0.0478326 | 1 | 0.7775361 | 0.2681094 | 0.4353436 | 0.2739571 | -0.2202768 | 1.0000000 | 0.1563045 | 0.2681094 | 1 | 0.1694654 | A2M |
| A2ML1 | 1.6854024 | 1 | 0.0002106 | 0.2445664 | 0.0000202 | 0.0000022 | -1.9299687 | 0.0464724 | 0.0000030 | 0.2445664 | 1 | 0.6348844 | A2ML1 |
| A4GALT | 1.0248506 | 0 | 0.0000000 | -0.4601841 | 0.0000000 | 0.0000000 | -0.5646665 | 0.2927492 | 0.0000192 | -0.4601841 | 1 | 0.0055222 | A4GALT |
| A4GNT | -1.1527859 | 1 | 0.0238302 | 1.3385294 | 0.1583022 | 0.0732984 | -0.1857435 | 1.0000000 | 0.6691773 | 1.3385294 | 1 | 0.0197330 | A4GNT |
| AAAS | -0.0245296 | 1 | 0.8574281 | -0.0303402 | 0.9711813 | 0.9075226 | 0.0548697 | 1.0000000 | 0.6608837 | -0.0303402 | 1 | 0.8469982 | AAAS |
| AACS | 0.1729592 | 1 | 0.2815302 | 0.1002448 | 0.2732939 | 0.1472059 | -0.2732040 | 1.0000000 | 0.0630182 | 0.1002448 | 1 | 0.5874802 | AACS |
The sig column in syn_polar@polar allows us to determine relative differences in expression between groups (in this case pathotypes). The ‘+’ indicates which pathotypes are significantly ‘up’ compared to others. For example:
genes labelled ‘Lymphoid+’ are significantly up in Lymphoid vs Myeloid and Lymphoid vs Fibroid.
genes up in two pathotypes such as ‘Lymphoid+Myeloid+’ are up in both Lymphoid and Myeloid, therefore Lymphoid vs Fibroid and Myeloid vs Fibroid are statistically significant.
genes which show no significant difference between pathotypes are classed according to non_sig_name
This gives us:
setNames(data.frame(table(syn_polar@polar$sig)), c("Significance", "Frequency"))
| Significance | Frequency |
|---|---|
| Fibroid+ | 885 |
| Fibroid+Lymphoid+ | 19 |
| Fibroid+Myeloid+ | 1504 |
| Lymphoid+ | 1793 |
| Lymphoid+Myeloid+ | 500 |
| Myeloid+ | 119 |
| Not Significant | 11415 |
If there is a fold change column previously provided, we can now investigate the comparisons between pathotypes using the volcano_trio function. This creates three ggplot outputs.
syn_plots <- volcano_trio(polar = syn_polar,
sig_names = c("not significant","significant",
"not significant","significant"),
colours = rep(c("grey60", "salmon"), 2),
text_size = 9,
shared_legend_size = 0.9,
label_rows = c("SLAMF6", "PARP16", "ITM2C"),
fc_line = FALSE)
syn_plots$All
These can be output to an interactive radar plot using radial_plotly. The labelRows variable allows any markers of interest to be labelled.
radial_plotly(polar = syn_polar, label_rows = c("SLAMF6", "PARP16", "ITM2C"))
By hovering over certain point you can also determine genes for future interrogation.
Similarly we can create a static ggplot image using radial_ggplot. We can also convert to a continuous colour scale by converting the angle to a hsv variable. This angle can be offset by continuous_shift.
radial_ggplot(polar = syn_polar,
label_rows = c("SLAMF6", "PARP16", "ITM2C"),
marker_size = 2.3,
marker_alpha=0.3,
colour_scale = "continuous",
continuous_shift=120)
We can then interrogate any one specific variable as a boxplot, to investigate these differences. This is build using ggplot2 so can easily be edited by the user to add features.
plot1 <- boxplot_trio(syn_polar,
value = "SLAMF6",
test = "polar_padj",
levels_order = c("Lymphoid", "Myeloid", "Fibroid"),
box_colours = c("blue", "red", "green3"),
step_increase = 0.1) +
labs(title="Adjusted Pvalues")
plot2 <- boxplot_trio(syn_polar,
value = "SLAMF6",
test = "polar_multi_padj",
levels_order = c("Lymphoid", "Myeloid", "Fibroid"),
box_colours = c("blue", "red", "green3")) +
labs(title="Overall Adjusted Pvalue")
plot3 <- boxplot_trio(syn_polar,
value = "PARP16",
test = "t.test",
levels_order = c("Myeloid", "Fibroid"),
box_colours = c("red", "green3")) +
labs(title="T-test")
ggarrange(plot1, plot2, plot3, ncol = 3)
The final thing we can look at is the 3D volcano plot which projects differential gene expression onto cylindrical coordinates.
p <- volcano3D(syn_polar,
label_rows = c("SLAMF6", "PARP16", "ITM2C"),
label_size = 10,
xy_aspectratio = 1,
z_aspectratio = 0.9)
p %>% layout(legend = list(x = 100, y = 0.5))
Again this produces an interactive plot. If you have the orca command-line utility installed, this can be used to save static images. To install follow the instructions here.
orca(p, "./volcano_3d_synovium.svg", format = "svg")
By default volcano3D generates a grid using 12 spokes. If you wish to override any of these variables it is possible to pass in your own grid. This is also possible for the 3D radial plots where the z elements can be left as NULL.
By manually creating a grid object it is possible to change the tick points on axes as well as the number of radial spokes (n_spokes).
The default ticks are calculated from r_vector and z_vector using the pretty function. This can be overwritten by passing tick points in r_axis_ticks and z_axis_ticks.
For example we can decrease the number of radial spokes to 4, while altering the z axis ticks:
four_grid = polar_grid(r_vector=syn_polar@polar$r_zscore,
z_vector=NULL,
r_axis_ticks = NULL,
z_axis_ticks = c(0, 8, 16, 32),
n_spokes = 4)
similarly to extend the radial axis and increasing the number of spokes in 2D we can apply:
new_grid = polar_grid(r_vector=NULL,
z_vector=-log10(syn_polar@pvalues$`LRT pvalue`),
r_axis_ticks = c(1, 2, 3),
z_axis_ticks = NULL,
n_spokes = 24)
By default the rownames of the pvalues object is used to label markers of interest. This can be altered by amending the label column in the polar object:
syn_polar@polar$label <- paste0(syn_polar@polar$label, "!")
radial_ggplot(polar = syn_polar,
label_rows = c("SLAMF6", "PARP16", "ITM2C"),
grid = new_grid,
marker_size = 2.3,
legend_size = 10) +
theme(legend.position=c(1.1, 1))
If you use this package please cite as:
Lewis, Myles J., et al. “Molecular portraits of early rheumatoid arthritis identify clinical and treatment response phenotypes.” Cell reports 28.9 (2019): 2455-2470.
or using:
citation("volcano3D")
##
## To cite package 'volcano3D' in publications use:
##
## Katriona Goldmann and Myles Lewis (2020). volcano3D: Interactive
## Plotting of Three-Way Differential Expression Analysis. R package
## version 1.0.0.9000. https://github.com/KatrionaGoldmann/volcano3D
##
## A BibTeX entry for LaTeX users is
##
## @Manual{,
## title = {volcano3D: Interactive Plotting of Three-Way Differential Expression
## Analysis},
## author = {Katriona Goldmann and Myles Lewis},
## year = {2020},
## note = {R package version 1.0.0.9000},
## url = {https://github.com/KatrionaGoldmann/volcano3D},
## }